Overview

Dataset statistics

Number of variables14
Number of observations1350
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory147.8 KiB
Average record size in memory112.1 B

Variable types

Numeric9
Categorical5

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
Overall Qual is highly correlated with Gr Liv Area and 7 other fieldsHigh correlation
Gr Liv Area is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Garage Cars is highly correlated with Overall Qual and 6 other fieldsHigh correlation
Garage Area is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Total Bsmt SF is highly correlated with 1st Flr SF and 1 other fieldsHigh correlation
1st Flr SF is highly correlated with Total Bsmt SF and 1 other fieldsHigh correlation
Full Bath is highly correlated with Overall Qual and 5 other fieldsHigh correlation
Year Built is highly correlated with Overall Qual and 6 other fieldsHigh correlation
Year Remod/Add is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Garage Yr Blt is highly correlated with Overall Qual and 6 other fieldsHigh correlation
target is highly correlated with Overall Qual and 9 other fieldsHigh correlation
Overall Qual is highly correlated with Gr Liv Area and 8 other fieldsHigh correlation
Gr Liv Area is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Garage Cars is highly correlated with Overall Qual and 6 other fieldsHigh correlation
Garage Area is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Total Bsmt SF is highly correlated with Overall Qual and 2 other fieldsHigh correlation
1st Flr SF is highly correlated with Gr Liv Area and 2 other fieldsHigh correlation
Full Bath is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Year Built is highly correlated with Overall Qual and 5 other fieldsHigh correlation
Year Remod/Add is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Garage Yr Blt is highly correlated with Overall Qual and 5 other fieldsHigh correlation
target is highly correlated with Overall Qual and 9 other fieldsHigh correlation
Overall Qual is highly correlated with Garage Cars and 3 other fieldsHigh correlation
Gr Liv Area is highly correlated with Full Bath and 1 other fieldsHigh correlation
Garage Cars is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Garage Area is highly correlated with Garage CarsHigh correlation
Total Bsmt SF is highly correlated with 1st Flr SFHigh correlation
1st Flr SF is highly correlated with Total Bsmt SFHigh correlation
Full Bath is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Year Built is highly correlated with Overall Qual and 2 other fieldsHigh correlation
Year Remod/Add is highly correlated with Year Built and 1 other fieldsHigh correlation
Garage Yr Blt is highly correlated with Garage Cars and 2 other fieldsHigh correlation
target is highly correlated with Overall Qual and 3 other fieldsHigh correlation
Exter Qual is highly correlated with Kitchen QualHigh correlation
Kitchen Qual is highly correlated with Exter QualHigh correlation
Overall Qual is highly correlated with Gr Liv Area and 9 other fieldsHigh correlation
Gr Liv Area is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Exter Qual is highly correlated with Overall Qual and 5 other fieldsHigh correlation
Garage Cars is highly correlated with Overall Qual and 8 other fieldsHigh correlation
Garage Area is highly correlated with Overall Qual and 8 other fieldsHigh correlation
Kitchen Qual is highly correlated with Overall Qual and 4 other fieldsHigh correlation
Total Bsmt SF is highly correlated with Gr Liv Area and 5 other fieldsHigh correlation
1st Flr SF is highly correlated with Gr Liv Area and 4 other fieldsHigh correlation
Bsmt Qual is highly correlated with Overall Qual and 9 other fieldsHigh correlation
Full Bath is highly correlated with Overall Qual and 8 other fieldsHigh correlation
Year Built is highly correlated with Overall Qual and 9 other fieldsHigh correlation
Year Remod/Add is highly correlated with Exter Qual and 6 other fieldsHigh correlation
Garage Yr Blt is highly correlated with Overall Qual and 7 other fieldsHigh correlation
target is highly correlated with Overall Qual and 12 other fieldsHigh correlation

Reproduction

Analysis started2022-02-03 08:30:43.668184
Analysis finished2022-02-03 08:31:10.901057
Duration27.23 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Overall Qual
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.208888889
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:11.001010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q15
median6
Q37
95-th percentile9
Maximum10
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.338015008
Coefficient of variation (CV)0.2154999118
Kurtosis-0.07763476292
Mean6.208888889
Median Absolute Deviation (MAD)1
Skewness0.345323694
Sum8382
Variance1.790284161
MonotonicityNot monotonic
2022-02-03T17:31:11.230180image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
5379
28.1%
6351
26.0%
7307
22.7%
8156
11.6%
475
 
5.6%
956
 
4.1%
1015
 
1.1%
38
 
0.6%
23
 
0.2%
ValueCountFrequency (%)
23
 
0.2%
38
 
0.6%
475
 
5.6%
5379
28.1%
6351
26.0%
7307
22.7%
8156
11.6%
956
 
4.1%
1015
 
1.1%
ValueCountFrequency (%)
1015
 
1.1%
956
 
4.1%
8156
11.6%
7307
22.7%
6351
26.0%
5379
28.1%
475
 
5.6%
38
 
0.6%
23
 
0.2%

Gr Liv Area
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct813
Distinct (%)60.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1513.542222
Minimum480
Maximum4476
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:11.532971image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum480
5-th percentile869.8
Q11144
median1445.5
Q31774.5
95-th percentile2498.1
Maximum4476
Range3996
Interquartile range (IQR)630.5

Descriptive statistics

Standard deviation487.5232386
Coefficient of variation (CV)0.3221074585
Kurtosis1.516308644
Mean1513.542222
Median Absolute Deviation (MAD)313.5
Skewness0.9803084672
Sum2043282
Variance237678.9081
MonotonicityNot monotonic
2022-02-03T17:31:11.843988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86419
 
1.4%
104012
 
0.9%
109212
 
0.9%
89411
 
0.8%
12009
 
0.7%
14569
 
0.7%
12247
 
0.5%
9607
 
0.5%
8487
 
0.5%
9887
 
0.5%
Other values (803)1250
92.6%
ValueCountFrequency (%)
4801
0.1%
5201
0.1%
5401
0.1%
5721
0.1%
6302
0.1%
6722
0.1%
6941
0.1%
7471
0.1%
7641
0.1%
7651
0.1%
ValueCountFrequency (%)
44761
0.1%
36271
0.1%
36081
0.1%
31941
0.1%
31401
0.1%
30821
0.1%
30051
0.1%
29781
0.1%
29561
0.1%
29451
0.1%

Exter Qual
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
TA
808 
Gd
485 
Ex
 
49
Fa
 
8

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEx
2nd rowGd
3rd rowTA
4th rowTA
5th rowGd

Common Values

ValueCountFrequency (%)
TA808
59.9%
Gd485
35.9%
Ex49
 
3.6%
Fa8
 
0.6%

Length

2022-02-03T17:31:12.434647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-02-03T17:31:12.833568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
ta808
59.9%
gd485
35.9%
ex49
 
3.6%
fa8
 
0.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Garage Cars
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
2
794 
1
372 
3
172 
4
 
11
5
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row3
2nd row2
3rd row1
4th row2
5th row3

Common Values

ValueCountFrequency (%)
2794
58.8%
1372
27.6%
3172
 
12.7%
411
 
0.8%
51
 
0.1%

Length

2022-02-03T17:31:13.155203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-02-03T17:31:13.370077image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2794
58.8%
1372
27.6%
3172
 
12.7%
411
 
0.8%
51
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Garage Area
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct454
Distinct (%)33.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean502.0148148
Minimum100
Maximum1488
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:13.641789image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile240
Q1368
median484
Q3588
95-th percentile865.55
Maximum1488
Range1388
Interquartile range (IQR)220

Descriptive statistics

Standard deviation191.3899564
Coefficient of variation (CV)0.3812436422
Kurtosis1.522404855
Mean502.0148148
Median Absolute Deviation (MAD)110.5
Skewness0.8864540664
Sum677720
Variance36630.11542
MonotonicityNot monotonic
2022-02-03T17:31:13.987202image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44047
 
3.5%
57642
 
3.1%
24037
 
2.7%
48036
 
2.7%
48435
 
2.6%
52832
 
2.4%
40028
 
2.1%
26427
 
2.0%
28824
 
1.8%
30819
 
1.4%
Other values (444)1023
75.8%
ValueCountFrequency (%)
1001
 
0.1%
1601
 
0.1%
1621
 
0.1%
1641
 
0.1%
1809
0.7%
1841
 
0.1%
1861
 
0.1%
1952
 
0.1%
2006
0.4%
2052
 
0.1%
ValueCountFrequency (%)
14881
0.1%
13901
0.1%
13561
0.1%
13481
0.1%
13141
0.1%
12311
0.1%
12201
0.1%
11841
0.1%
11381
0.1%
11101
0.1%

Kitchen Qual
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
TA
660 
Gd
560 
Ex
107 
Fa
 
23

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEx
2nd rowGd
3rd rowTA
4th rowGd
5th rowGd

Common Values

ValueCountFrequency (%)
TA660
48.9%
Gd560
41.5%
Ex107
 
7.9%
Fa23
 
1.7%

Length

2022-02-03T17:31:14.333560image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-02-03T17:31:14.708745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
ta660
48.9%
gd560
41.5%
ex107
 
7.9%
fa23
 
1.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Total Bsmt SF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct709
Distinct (%)52.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1082.644444
Minimum105
Maximum2660
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:15.049313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum105
5-th percentile548.35
Q1816
median1009
Q31309.5
95-th percentile1807.7
Maximum2660
Range2555
Interquartile range (IQR)493.5

Descriptive statistics

Standard deviation384.0677133
Coefficient of variation (CV)0.3547496274
Kurtosis0.60547131
Mean1082.644444
Median Absolute Deviation (MAD)229.5
Skewness0.7442916706
Sum1461570
Variance147508.0084
MonotonicityNot monotonic
2022-02-03T17:31:15.381169image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86437
 
2.7%
76815
 
1.1%
104015
 
1.1%
67214
 
1.0%
100813
 
1.0%
91211
 
0.8%
89411
 
0.8%
72010
 
0.7%
81610
 
0.7%
9609
 
0.7%
Other values (699)1205
89.3%
ValueCountFrequency (%)
1051
 
0.1%
1601
 
0.1%
1901
 
0.1%
2161
 
0.1%
2401
 
0.1%
2641
 
0.1%
2971
 
0.1%
3461
 
0.1%
3561
 
0.1%
3849
0.7%
ValueCountFrequency (%)
26601
0.1%
25351
0.1%
25241
0.1%
24611
0.1%
24581
0.1%
24521
0.1%
23961
0.1%
23921
0.1%
22711
0.1%
22201
0.1%

1st Flr SF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct736
Distinct (%)54.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1167.474074
Minimum480
Maximum2898
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:15.868456image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum480
5-th percentile672
Q1886.25
median1092.5
Q31396.5
95-th percentile1846.2
Maximum2898
Range2418
Interquartile range (IQR)510.25

Descriptive statistics

Standard deviation375.061407
Coefficient of variation (CV)0.3212588744
Kurtosis0.7450582024
Mean1167.474074
Median Absolute Deviation (MAD)236.5
Skewness0.8317876671
Sum1576090
Variance140671.059
MonotonicityNot monotonic
2022-02-03T17:31:16.454476image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86420
 
1.5%
104014
 
1.0%
89411
 
0.8%
96011
 
0.8%
8489
 
0.7%
9908
 
0.6%
8328
 
0.6%
6307
 
0.5%
9127
 
0.5%
6727
 
0.5%
Other values (726)1248
92.4%
ValueCountFrequency (%)
4801
 
0.1%
4836
0.4%
5021
 
0.1%
5161
 
0.1%
5204
0.3%
5254
0.3%
5262
 
0.1%
5303
0.2%
5361
 
0.1%
5401
 
0.1%
ValueCountFrequency (%)
28981
0.1%
25241
0.1%
25221
0.1%
25151
0.1%
24971
0.1%
24901
0.1%
24701
0.1%
24521
0.1%
24221
0.1%
24111
0.1%

Bsmt Qual
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
TA
605 
Gd
582 
Ex
134 
Fa
 
28
Po
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowEx
2nd rowEx
3rd rowTA
4th rowTA
5th rowGd

Common Values

ValueCountFrequency (%)
TA605
44.8%
Gd582
43.1%
Ex134
 
9.9%
Fa28
 
2.1%
Po1
 
0.1%

Length

2022-02-03T17:31:16.735080image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-02-03T17:31:16.900878image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
ta605
44.8%
gd582
43.1%
ex134
 
9.9%
fa28
 
2.1%
po1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Full Bath
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
2
703 
1
612 
3
 
27
0
 
6
4
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2703
52.1%
1612
45.3%
327
 
2.0%
06
 
0.4%
42
 
0.1%

Length

2022-02-03T17:31:17.092155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-02-03T17:31:17.256795image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2703
52.1%
1612
45.3%
327
 
2.0%
06
 
0.4%
42
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Year Built
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct105
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1972.987407
Minimum1880
Maximum2010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:17.437251image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1880
5-th percentile1920
Q11955
median1976
Q32002
95-th percentile2007
Maximum2010
Range130
Interquartile range (IQR)47

Descriptive statistics

Standard deviation29.30725737
Coefficient of variation (CV)0.01485425465
Kurtosis-0.3762728739
Mean1972.987407
Median Absolute Deviation (MAD)23
Skewness-0.6480337442
Sum2663533
Variance858.9153343
MonotonicityNot monotonic
2022-02-03T17:31:17.762125image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200571
 
5.3%
200663
 
4.7%
200348
 
3.6%
200748
 
3.6%
200448
 
3.6%
197733
 
2.4%
197627
 
2.0%
200825
 
1.9%
197824
 
1.8%
196824
 
1.8%
Other values (95)939
69.6%
ValueCountFrequency (%)
18803
 
0.2%
18821
 
0.1%
18852
 
0.1%
18903
 
0.2%
18921
 
0.1%
18931
 
0.1%
190010
0.7%
19081
 
0.1%
191017
1.3%
19121
 
0.1%
ValueCountFrequency (%)
20101
 
0.1%
200914
 
1.0%
200825
 
1.9%
200748
3.6%
200663
4.7%
200571
5.3%
200448
3.6%
200348
3.6%
200222
 
1.6%
200119
 
1.4%

Year Remod/Add
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct61
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1985.099259
Minimum1950
Maximum2010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:18.080708image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1950
5-th percentile1950
Q11968
median1993
Q32004
95-th percentile2007
Maximum2010
Range60
Interquartile range (IQR)36

Descriptive statistics

Standard deviation20.15324351
Coefficient of variation (CV)0.01015225985
Kurtosis-1.236075056
Mean1985.099259
Median Absolute Deviation (MAD)13
Skewness-0.5010053957
Sum2679884
Variance406.1532241
MonotonicityNot monotonic
2022-02-03T17:31:18.352001image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1950134
 
9.9%
200693
 
6.9%
200568
 
5.0%
200767
 
5.0%
200458
 
4.3%
200352
 
3.9%
200043
 
3.2%
200838
 
2.8%
200237
 
2.7%
199835
 
2.6%
Other values (51)725
53.7%
ValueCountFrequency (%)
1950134
9.9%
19514
 
0.3%
19529
 
0.7%
19536
 
0.4%
195414
 
1.0%
195512
 
0.9%
195612
 
0.9%
19579
 
0.7%
195817
 
1.3%
19599
 
0.7%
ValueCountFrequency (%)
20105
 
0.4%
200917
 
1.3%
200838
2.8%
200767
5.0%
200693
6.9%
200568
5.0%
200458
4.3%
200352
3.9%
200237
 
2.7%
200123
 
1.7%

Garage Yr Blt
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct97
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1978.471852
Minimum1900
Maximum2207
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:18.654943image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1930
Q11961
median1978.5
Q32002
95-th percentile2007
Maximum2207
Range307
Interquartile range (IQR)41

Descriptive statistics

Standard deviation25.37727821
Coefficient of variation (CV)0.01282670673
Kurtosis4.149795343
Mean1978.471852
Median Absolute Deviation (MAD)20.5
Skewness-0.0419405896
Sum2670937
Variance644.0062493
MonotonicityNot monotonic
2022-02-03T17:31:18.913011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200570
 
5.2%
200653
 
3.9%
200751
 
3.8%
200450
 
3.7%
200349
 
3.6%
197740
 
3.0%
200832
 
2.4%
196829
 
2.1%
199328
 
2.1%
200027
 
2.0%
Other values (87)921
68.2%
ValueCountFrequency (%)
19002
 
0.1%
19103
 
0.2%
19142
 
0.1%
19152
 
0.1%
19162
 
0.1%
19181
 
0.1%
19191
 
0.1%
192014
1.0%
19212
 
0.1%
19226
0.4%
ValueCountFrequency (%)
22071
 
0.1%
20101
 
0.1%
200917
 
1.3%
200832
2.4%
200751
3.8%
200653
3.9%
200570
5.2%
200450
3.7%
200349
3.6%
200224
 
1.8%

target
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct619
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186406.3126
Minimum12789
Maximum745000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.7 KiB
2022-02-03T17:31:19.493976image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum12789
5-th percentile102450
Q1135000
median165375
Q3217875
95-th percentile339381.45
Maximum745000
Range732211
Interquartile range (IQR)82875

Descriptive statistics

Standard deviation78435.42476
Coefficient of variation (CV)0.4207766554
Kurtosis4.939592831
Mean186406.3126
Median Absolute Deviation (MAD)37750
Skewness1.71758567
Sum251648522
Variance6152115857
MonotonicityNot monotonic
2022-02-03T17:31:19.782344image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15500015
 
1.1%
14000014
 
1.0%
14500013
 
1.0%
13500012
 
0.9%
17000012
 
0.9%
14400011
 
0.8%
14300011
 
0.8%
10500010
 
0.7%
16500010
 
0.7%
11000010
 
0.7%
Other values (609)1232
91.3%
ValueCountFrequency (%)
127891
0.1%
353111
0.1%
500001
0.1%
559931
0.1%
585001
0.1%
630001
0.1%
640002
0.1%
670001
0.1%
684001
0.1%
685001
0.1%
ValueCountFrequency (%)
7450001
0.1%
6250001
0.1%
6150001
0.1%
5915871
0.1%
5565811
0.1%
5350001
0.1%
4850001
0.1%
4750002
0.1%
4700001
0.1%
4680001
0.1%

Interactions

2022-02-03T17:31:07.466774image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:47.142110image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:49.979371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:52.501380image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:54.927587image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:57.569077image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:59.940295image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:02.358449image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:05.056937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:07.761111image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:47.563537image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:50.271834image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:52.796084image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:55.199054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:57.814865image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:00.211046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:02.644788image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:05.335003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:08.067945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:48.146065image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:50.557785image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:53.092864image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:55.473264image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:58.082936image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:00.468066image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:02.914996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:05.619995image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:08.355905image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:48.418103image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:50.832196image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:53.347651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:55.694893image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:58.348345image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:00.748880image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:03.176019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:05.877118image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:08.631955image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:48.662024image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:51.102035image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:53.573047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:56.257908image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:58.593978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:01.010861image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:03.425187image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:06.135892image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:08.879799image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:48.915002image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:51.371976image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:53.853027image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:56.504245image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:58.856309image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:01.268923image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:03.683402image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:06.379917image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:09.143982image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:49.182011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:51.650043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:54.148947image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:56.761370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:59.136590image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:01.534976image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:03.945014image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:06.649938image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:09.395969image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:49.429940image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:51.938829image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:54.407985image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:57.053765image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:59.409009image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:01.787013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:04.545323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:06.911902image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:09.659993image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:49.691949image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:52.208039image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:54.661440image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:57.313054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:30:59.665742image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:02.073471image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:04.786742image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-02-03T17:31:07.191385image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-02-03T17:31:20.082502image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-02-03T17:31:20.484019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-02-03T17:31:20.846676image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-02-03T17:31:21.229344image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-02-03T17:31:21.546131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-02-03T17:31:10.168994image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-02-03T17:31:10.714244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Overall QualGr Liv AreaExter QualGarage CarsGarage AreaKitchen QualTotal Bsmt SF1st Flr SFBsmt QualFull BathYear BuiltYear Remod/AddGarage Yr Blttarget
0102392Ex3968Ex23922392Ex2200320032003386250
171352Gd2466Gd13521352Ex2200620072006194000
25900TA1288TA864900TA1196719671967123000
351174TA2576Gd680680TA1190020062000135000
471958Gd3936Gd10261026Gd2200520052005250000
581968Gd3680Ex774774Ex2200920102009269500
661478TA2442TA14781478TA1195719571957156500
782524Gd2542Gd25242524Gd2198119811981278000
882649Gd3746Gd14791515Ex2200120022001421250
981440Gd2467Gd14321440Ex2200320032003232500

Last rows

Overall QualGr Liv AreaExter QualGarage CarsGarage AreaKitchen QualTotal Bsmt SF1st Flr SFBsmt QualFull BathYear BuiltYear Remod/AddGarage Yr Blttarget
13405925TA2484TA925925TA1196319631990133500
13414572TA1200TA572572TA119251950194075000
134262614TA2624TA15221548TA2197419971974240000
13435960TA1392Fa960960TA1195920001959131750
13446865TA1216TA660740TA1192019951920108500
134561756Gd2422TA872888Ex2199619971996204000
134692748Gd3850Ex18501850Ex2200620062006390000
134751214TA1318TA12141214TA2196719671967143000
13485894TA2440TA864894Gd1197419741974131000
13495907TA1343TA907907Gd1197819781978140000

Duplicate rows

Most frequently occurring

Overall QualGr Liv AreaExter QualGarage CarsGarage AreaKitchen QualTotal Bsmt SF1st Flr SFBsmt QualFull BathYear BuiltYear Remod/AddGarage Yr Blttarget# duplicates
072787TA4820TA11681168Ex42000200020002695002